Taming Wild Phrases
نویسندگان
چکیده
In this paper the suitability of different document representations for automatic document classification is compared, investigating a whole range of representations between bag-of-words and bag-of-phrases. We look at some of their statistical properties, and determine for each representation the optimal choice of classification parameters and the effect of Term Selection. Phrases are represented by an abstraction called Head/Modifier pairs. Rather than just throwing phrases and keywords together, we start with pure HM pairs and gradually add more keywords to the document representation. We use the classification on keywords as the baseline, which we compare with the contribution of the pure HM pairs to classification accuracy, and the incremental contributions from heads and modifiers. Finally, we measure the accuracy achieved with all words and all HM pairs combined, which turns out to be only marginally above the baseline. We conclude that even the most careful term selection cannot overcome the differences in Document Frequency between phrases and words, and propose the use of term clustering to make phrases more cooperative.
منابع مشابه
Personalized Tag Predition Boosted by BaggTaming A Case Study of the Hatena Bookmark
We stated a learning problem, which we call taming, and develop a method for this problem in [神嶌 08b, 神 嶌 08c, Kamishima 08a]. The learner for this taming requests two types of training data sets, tame and wild. The labels of tame data is highly consistent with a target concept, which we actually want to learn. In contrast, wild data are not so well maintained; thus, some labels are consistent ...
متن کاملBaggTaming — Learning from Wild and Tame Data
We address a new machine learning problem, taming, that involves two types of training sets: wild data and tame data. These two types of data sets are mutually complementary. A wild data set is less consistent, but is much larger in size than a tame set. Conversely, a tame set has consistency but not as many data. The goal of our taming task is to learn more accurate classifiers by exploiting t...
متن کاملTaming the Wild in Impartial Combinatorial Games
We introduce a misere quotient semigroup construction in impartial combinatorial game theory, and argue that it is the long-sought natural generalization of the normal-play Sprague-Grundy theory to misere play. Along the way, we illustrate how to use the theory to describe complete analyses of two wild taking and breaking games.
متن کاملTaming of the Wild Group of Magnetic Translations
We use a theorem of Auslander and Kostant on the representation theory of solvable Lie-groups for the study of some groups necessary for the description of certain quasi-periodic systems of solid-state physics. We show that the magnetic translation group is tame (Type I) if the magnetic field is not constant but fluctuating.
متن کاملFirst as Farce, Then as Filmfarsi: Film Adaptation of Shakespeare’s The Taming of the Shrew in Iran
This article is concerned with William Shakespeare’s famous farce play The Taming of the Shrew and its Persian adaptation as an Iranian film called Gorbe ra dame Hejleh Mikoshand in 1969. The point that informs the inquiry is the way the film departs and differs from the play in relation to the issue of women within the patriarchal society. The play and the film will be examined separately in d...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003